Improved Bounds for Mixing Rates of Markov Chains and Multicommodity Ow
نویسنده
چکیده
The paper is concerned with tools for the quantitative analysis of nite Markov chains whose states are combinatorial structures. Chains of this kind have algorithmic applications in many areas, including random sampling, approximate counting, statistical physics and combinatorial optimisation. The e ciency of the resulting algorithms depends crucially on the mixing rate of the chain, i.e., the time taken for it to reach its stationary or equilibrium distribution. The paper presents a new upper bound on the mixing rate, based on the solution to a multicommodity ow problem in the Markov chain viewed as a graph. The bound gives sharper estimates for the mixing rate of several important complex Markov chains. As a result, improved bounds are obtained for the runtimes of randomised approximation algorithms for various problems, including computing the permanent of a 0-1 matrix, counting matchings in graphs, and computing the partition function of a ferromagnetic Ising system. Moreover, solutions to the multicommodity ow problem are shown to capture the mixing rate quite closely: thus, under fairly general conditions, a Markov chain is rapidly mixing if and only if it supports a ow of low cost. 1 Summary In recent years, Markov chain simulation has emerged as a powerful algorithmic paradigm. Its chief application is to the random sampling of combinatorial structures from a speci ed probability distribution. Such a sampling procedure lies at the heart of e cient probabilistic algorithms for a wide variety of problems, such as approximating the size of combinatorially de ned sets, estimating the expectation of certain operators in statistical physics, and combinatorial optimisation by stochastic search. The algorithmic idea is simple. Suppose we wish to sample the elements of a large but nite set X of structures from a distribution . First, construct a Markov chain whose states are the elements of X and which converges asymptotically to the stationary or equilibrium distribution over X ; it is usually possible to do this using as transitions simple random perturbations of the structures in X . Then, starting from an arbitrary state, simulate the chain until it is close to equilibrium; the distribution of the nal state will be close to the desired distribution . To take a typical example, let H be a connected graph and X the set of spanning trees of H , and suppose we wish to sample elements of X from a uniform distribution. Consider the Markov chain MC(H) with state space X which, given a spanning tree T 2 X , makes transitions as follows: select uniformly at random an edge e of H which does not belong to T , add e to T , thereby creating a single cycle C , and nally remove an edge of C uniformly at random to create a new spanning tree T 0 . It is not hard to check that this Markov chain converges to the uniform distribution over X . Analysing the e ciency of the above technique in a given application presents a considerable challenge. The key issue is to determine the mixing rate of the chain, i.e., the number of simulation steps needed to ensure that it is su ciently close to its equilibrium distribution . An e cient algorithm can result only if this number is reasonably small, which usually means dramatically less than the size of the state space X itself. For example, in the spanning tree problem above we would want MC(H) to reach equilibrium in time bounded by some polynomial in n, the size of the problem instance H ; however, the number of states jXj will typically be exponential in n. Informally, we will call chains having this property rapidly mixing. (More correctly, this is a property of families of chains, such as MC(H), parameterised on problem instances.) The classical theory of Markov chains has not been greatly concerned with a quantitative study of the approach to equilibrium. This has led to the development recently of new analytic tools, based on coupling, stopping times and group representation theory, which have been successfully applied to chains with a regular structure such as random walks on certain special graphs or groups. The book by Diaconis [7] gives an excellent survey. Markov chains arising in the combinatorial applications mentioned above are typically much more complex, however. The rst analyses of such chains were 1 made possible using a quantity called the conductance [23, 24]. Suppose we view a (reversible) Markov chain as a weighted graph G, whose vertices are states and whose edges are transitions. Then the conductance (G) is essentially the edge expansion of G. Equivalently, may be viewed as the probability that the chain in equilibrium escapes from a subset S of the state space in one step, minimised over small subsets S . (Precise de nitions of this and other quantities are given in later sections.) It is intuitively reasonable that should be related to the mixing rate: if the above escape probability is small for some S then the cut edges separating S from X S constitute a \constriction" or bottleneck which prevents rapid convergence to equilibrium. Conversely, a large value of means that the chain cannot get trapped by any small region of the space, and hence should be rapidly mixing. A useful piece of technology for obtaining lower bounds on in complex examples was developed in [11, 23]. The idea is to construct a canonical path xy in the graph G between each ordered pair of distinct states x and y . If the paths can be chosen in such a way that no edge is overloaded by paths, then the chain cannot contain a constriction, so is not too small. (The existence of a constriction between S and X S would imply that any choice of paths must overload the edges in the constriction.) More precisely, suppose is the maximum loading of an edge by paths; then it is not hard to show (see Theorem 3 of Section 2) that (2 ) 1 , so does indeed provide a bound on the mixing rate of the chain. The power of this observation lies in the fact that a good collection = f xyg of canonical paths can sometimes be constructed for which can be bounded rather tightly; indeed, the quantity arises very naturally from a combinatorial encoding technique, as explained in Section 3. In a recent paper [8], Diaconis and Stroock observed that path arguments similar to that described above can lead directly to bounds on the mixing rate, independently of the conductance . In this paper, we present a new bound which is a modi cation of that of Diaconis and Stroock. The new bound also involves the maximum loading of an edge by paths, but takes into account the lengths of the paths. A simpli ed form of the bound (Corollary 6 of Section 2) relates the mixing rate to the product ` for a collection of paths , where ` is the length of a longest path in . This bound turns out to be sharper than the conductance-based bound above when the maximum path length ` is small compared to . In Section 3 of the paper, we illustrate the e ectiveness of the new bound by obtaining signi cantly improved estimates for the mixing rate of several important complex Markov chains, which have been used in the design of algorithms for problems involving monomer-dimer systems, matchings in graphs, the Ising model, and almost uniform generation of combinatorial structures. The factors saved in the mixing rate translate directly to the runtime of the algorithms that use the chains. These improvements apparently do not follow from the similar bound given by Diaconis and Stroock because the Markov chains in question have widely 2 di ering weights on their edges. (The two bounds are equivalent if the edge weights are uniform, e.g., in the case of random walk on a graph.) Finally, in Section 4, we address the problem of characterising the rapid mixing property for reversible Markov chains. It is already known that the conductance characterises rapid mixing, in the sense that 1 essentially measures the mixing rate up to a polynomial factor (in fact, a square). In view of the foregoing results, it is natural to ask whether a similar characterisation in terms of the path measure also holds. This would mean that whenever a Markov chain is rapidly mixing a proof using a path argument exists. We are able to answer this question in the a rmative provided the de nition of is generalised in a natural way to allow multiple rather than canonical paths between pairs of states. This leads us to consider a multicommodity ow problem in the graph G describing the Markov chain, in which a certain quantity of some commodity (x; y) is to be transported from x to y for all pairs x; y 2 X . For a given ow, may then be interpreted as the maximum total ow through any edge e as a fraction of its weight, or capacity. Minimising over all possible ows, we get a quantity which we call the resistance (G) of the Markov chain. The main result of this section states that, if a reversible Markov chain is close to equilibrium after steps, then its resistance cannot exceed O( ). Thus the resistance, like the conductance, does indeed characterise the rapid mixing property. We also observe that the quantities 1 and are in fact equal up to a factor O(logN). This is actually an approximate maxow min-cut theorem for the multicommodity ow problem, and is a natural generalisation of a result obtained in a di erent context by Leighton and Rao [17]. 2 Bounds on the mixing rate We assume familiarity with the elementary theory of Markov chains: see, e.g., [15] for a more detailed treatment. Let X be a nite set, and P the transition matrix of a discrete-time Markov chain on state space X . We assume throughout that P is irreducible (i.e., that all states communicate) and reversible with respect to the probability distribution on X , i.e., it satis es the detailed balance condition Q(x; y) (x)P (x; y) = (y)P (y; x) for all x; y 2 X: (1) Condition (1) implies that is a stationary or equilibrium distribution for P , i.e., P = . If in addition P is aperiodic, the distribution of the state at time t converges pointwise to as t ! 1, regardless of the initial state. In this case the chain is called ergodic. Simulating an ergodic chain for su ciently many steps starting from an arbitrary initial state, and noting the nal state, provides an algorithm for sampling elements of X from a distribution that is arbitrarily close to . We note that the above framework is quite general for the purposes of the combinatorial applications mentioned in the previous section. In particular, it is 3 usually a straightforward matter to make condition (1) hold using as transitions simple random perturbations of the structures in X , such as those employed in the spanning tree example given earlier. It is convenient to identify a reversible Markov chain with a weighted undirected graph G on vertex set X , with an edge of weight Q(x; y) connecting vertices x and y i Q(x; y) > 0. (Thus the graph may contain self-loops.) Note that this graph is always connected and uniquely speci es the chain. As is well known, P has real eigenvalues 1 = 0 > 1 2 : : : N 1 1, where N = jXj; P is ergodic i N 1 > 1. For an ergodic chain, the rate of convergence to is governed by the second-largest eigenvalue in absolute value, max = maxf 1; j N 1jg. To make this statement precise, let x be the state at time t = 0 and denote by P t(x; ) the distribution of the state at time t. The variation distance at time t with initial state x is x(t) = max S X jP t(x; S) (S)j = 12 X y2X jP t(x; y) (y)j: We will measure rate of convergence using the function x de ned for > 0 by x( ) = minft : x(t0) for all t0 tg: Proposition 1 The quantity x( ) satis es (i) x( ) (1 max) 1 ln (x) 1 + ln 1 ; (ii) max x2X x( ) 1 2 max(1 max) 1 ln(2 ) 1 . Part (i) follows from [8, Proposition 3] and gives an upper bound on the time to reach equilibrium from a given initial state x in terms of max and (x). The converse, part (ii), which is a discrete-time version of [1, Proposition 8], says that convergence cannot be rapid unless max is bounded away from 1. (Note that in the latter bound there is a maximisation over initial states: it is possible for a chain to converge fast from certain states even when max is close to 1. However, even if such a state exists, nding it requires more detailed information about the chain than is usually available in the more complex examples of interest to us.) Results analogous to Proposition 1 hold for measures other than the variation distance. For example, [23, 24] give bounds in terms of the relative pointwise distance, de ned by rpd x (t) = maxy2X jP t(x; y) (y)j= (y). In the remainder of this paper, we will ignore the technical issues arising from the choice of initial state. Proposition 1 then shows that we can identify the rapid mixing property with a large value of the spectral gap 1 max . Moreover, in practice the smallest eigenvalue N 1 is unimportant: a crude approach is to add a holding probability of 1 2 to every state, i.e., replace P by 12(I + P ), where I is the N N identity matrix. This ensures that all eigenvalues are non-negative while decreasing the spectral gap 1 1 only by a factor of 2. We therefore focus attention on the second eigenvalue 1 . 4 As indicated in the previous section, the rst upper bounds on 1 for complex Markov chains were based on the conductance [23, 24], de ned by (G) = min S X 0< (S) 1=2 Q(S; S) (S) ; (2) where G is the weighted graph describing the chain and Q(S; S) denotes the sum of Q(x; y) over edges fx; yg in G with x 2 S and y 2 S = X S . The conductance may be viewed as a weighted version of the edge expansion of G. Alternatively, since Q(S; S) = Px2S;y2S (x)P (x; y), the quotient in (2) is just the conditional probability that the chain in equilibrium escapes from the subset S of the state space in one step, given that it is initially in S . Thus measures the ability of the chain to escape from any small region of the state space, and hence to make rapid progress to equilibrium. The following result formalising this intuition is from [23, 24]; see also [2, 3, 4, 16, 19, 21] for related results. Theorem 2 The second eigenvalue 1 of a reversible Markov chain satis es 1 2 1 1 2 2 : Note that characterises the rapid mixing property: a Markov chain is rapidly mixing, in the sense of the previous section, if and only if 1=poly(n), where n is the problem size. In order to apply Theorem 2, it is necessary to estimate the conductance . Since we are usually more interested in positive results, lower bounds on are generally of greater interest and we focus on them for most of the rest of this paper. (We shall consider negative results in Section 4.) In some cases such a bound can be obtained directly, using elementary arguments [16, 24] or geometric ideas [9, 14]. However, in many important applications the only known handle on is via the canonical path approach sketched in the previous section. Thus we attempt to construct a family = f xyg of simple paths in G, one between each ordered pair of distinct states x and y , such that no edge is overloaded by paths. The maximum loading of any edge is measured by the quantity ( ) = max e 1 Q(e) X xy3e (x) (y); (3) where the maximum is over oriented edges e in G (i.e, transitions of the Markov chain), and Q(e) = Q(u; v) if e = (u; v). Note that we may view the Markov chain as a ow network, in which (x) (y) units of ow travel from x to y along xy , and Q(e) plays the role of the capacity of e. The quantity then measures the maximum ow along any edge as a fraction of its capacity. We shall pursue this analogy further in Section 4. The following simple result con rms our intuition that the existence of a good choice of paths should imply a large value for the conductance. 5 Theorem 3 For any reversible Markov chain, and any choice of canonical paths, 1 2 : Proof: Let S X be a subset with 0 < (S) 1 2 which minimises the quotient Q(S; S)= (S). For any choice of paths, the total net ow crossing the cut from S to S is (S) (S); moreover, the aggregated capacity of the cut edges (x; y), with x 2 S and y 2 S , is just Q(S; S). Hence there must exist a cut edge e with 1 Q(e) X xy3e (x) (y) (S) (S) Q(S; S) (S) 2Q(S; S) = 1 2 : Theorems 2 and 3 immediately yield the following bound on 1 : Corollary 4 For any reversible Markov chain, and any choice of canonical paths, the second eigenvalue 1 satis es 1 1 1 8 2 : In recent work [8], Diaconis and Stroock observed that bounds on 1 can be obtained directly in terms of canonical paths, without appealing to the conductance bound of Theorem 2. This latter bound is potentially rather weak because of the appearance of the square, so a direct approach may lead to sharper estimates for 1 . We now present a modi ed version of Diaconis' and Stroock's bound which is apparently more useful than theirs in many combinatorial applications. In the next section, we will illustrate the e ectiveness of the bound by obtaining improved estimates for the second eigenvalue of several important Markov chains. To state the new bound, we modify the measure to take into account the lengths of the paths. For a given collection = f xyg of canonical paths, the key quantity is now ( ) = max e 1 Q(e) X xy3e (x) (y)j xyj; (4) where j xyj denotes the length (i.e., number of edges) of the path xy . Theorem 5 For any reversible Markov chain, and any choice of canonical paths, the second eigenvalue 1 satis es 1 1 1 : Proof: Let L = I P , so that the eigenvalues of L are i = 1 i . Following [8], the variational characterisation of 1 is 1 = inf Px;y2X ( (x) (y))2Q(x; y) Px;y2X ( (x) (y))2 (x) (y); (5) 6 where the in mum is taken over all non-constant functions : X ! R. (The constant functions are the only eigenfunctions of L with eigenvalue 0 = 0.) Now for any , and any choice of canonical paths , the denominator of (5) may be written as follows: Xx;y ( (x) (y))2 (x) (y) = Xx;y (x) (y) X e2 xy( (e+) (e )) 2 Xx;y (x) (y)j xyj X e2 xy( (e+) (e ))2 = Xe ( (e+) (e ))2 X xy3e (x) (y)j xyj Xe ( (e+) (e ))2Q(e) ( ) = ( )Xx;y Q(x; y) ( (x) (y))2 : Here e and e+ denote the start and end vertices of the oriented edge e, and the rst inequality is Cauchy-Schwarz. The result now follows from (5). The following simpli ed form of Theorem 5 is often useful. Corollary 6 For any reversible Markov chain, and any choice of canonical paths , the second eigenvalue 1 satis es 1 1 1 `; where ` `( ) is the length of a longest path in . Corollary 6 may be applied in the same situations as Corollary 4, by constructing paths and estimating the quantity . Frequently, however, the maximumpath length ` will be signi cantly less than the estimate obtained for ; in such cases, Corollary 6 will give a sharper bound than Corollary 4. The improved bounds presented in the next section are all based on this observation. Remark: Diaconis and Stroock [8] give a bound which is similar to that of Theorem 5 but which uses a di erent measure of path length. To get their bound we replace j xyj in the de nition (4) of by the quantity j xyjQ;e = X e02 xy Q(e) Q(e0) ; with everything else de ned as before. Let DS be the measure obtained in this way. Diaconis' and Stroock's bound [8, Proposition 1], may be stated as 1 1 DS 1: (6) 7 The examples in the next section indicate that , or `, may be a more useful quantity to work with in practice that DS . The reason seems to be that has a more \local" nature than DS . To see this, note that the contribution of a path xy 3 e to is just Q(e) 1j xyj (x) (y); which depends only on the path length and on Q(e), while its contribution to DS is Q(e) 1j xyjQ;e (x) (y) = X e02 xy Q(e0) 1 (x) (y); which depends on the capacities Q(e0) of all path edges. As we shall see, this makes the former quantity easier to use in applications where weights play a signi cant role, so that the capacities Q(e) vary considerably: the problem with DS is that a path xy may pass through edges of very small capacity, so that j xyjQ;e is much larger than j xyj. If the Markov chain under consideration is random walk on a graph G = (X;E) then Q(e) = 1=2jEj for all e, so j xyjQ;e = j xyj; hence the quantities and DS coincide in this case. Most of the examples discussed by Diaconis and Stroock [8] are in fact random walks on graphs, so our bound yields identical results for them. Note also that j xyjQ;e 1 for all xy and e, so certainly DS is bounded below by . Hence the bounds of Theorem 5 and Corollary 6 can be worse than (6) by at most a factor `. Moreover, there are examples for which is provably signi cantly better than DS ; one such is the Ehrenfest urn model, discussed by Diaconis and Stroock [8]. However, the two quantities seem to be incomparable in general. The examples in the next section and in [8] indicate that frequently leads to sharper bounds on 1 than does itself. By way of contrast, here is a simple example where Corollary 4 provably does better than Theorem 5 and Corollary 6. Consider asymmetric random walk on the line [0; N 1] with re ecting barriers, i.e., X = f0; 1; : : : ; N 1g and the transition probabilities are given by P (i 1; i) = , P (i; i 1) = 1 for 0 < i < N , and P (0; 0) = 1 , P (N 1; N 1) = , where 2 (0; 12) is a constant. This chain is reversible and ergodic, with stationary distribution (i) / ri , where r = =(1 ). In this case there is a unique simple path between each pair of states i and j , so our choice of canonical paths is forced. Elementary calculations show that the quantity Q(e) 1P xy3e (x) (y) is maximised on the edge e = (dN2 e 1; dN2 e) and that its value there is 1+r 1 r (1 +O(rN=2)). Hence Corollary 4 gives the bound 1 1 (1 r)2 8(1 + r)2 1 +O(rN=2) : (7) The value of 1 for this chain is known exactly: it is 2( (1 ))1=2 cos( N ) = 2r1=2 1+r (1 + O(N 2)). Hence (7) di ers from the true value asymptotically by only 8 a constant factor. On the other hand, a similar calculation considering the edge (N 2; N 1) shows that (1 + r)N + O(1). Thus Theorem 5 gives the bound 1 1 1 (1 + r)N +O 1 N2! ; which is asymptotically much worse than (7). 3 Applications In this section we discuss a series of complex Markov chains used in combinatorial applications whose mixing rate is currently estimated using the conductance-based bounds of Theorem 2 or Corollary 4. In each case, we indicate the improvement in the lower bound on the spectral gap 1 1 obtained using Corollary 6. By Proposition 1, this translates directly to a similar improvement in the mixing rate. As the precise arguments are combinatorially delicate, our present treatment will necessarily be very sketchy. For full details, the reader is urged to consult the stated references. The sharpened analysis of these Markov chains is of interest in its own right and is our main concern here. However, it also leads to improved estimates for the runtimes of various polynomial-time algorithms that make use of the chains. In fact, since the runtime is dominated by the time needed to sample some number of structures from the stationary distribution, each algorithm is immediately speeded up by exactly the factor saved in the spectral gap. (The runtimes of the algorithms can be readily computed from the spectral gap and are not given explicitly here; details may be found in the references.) These improvements, though signi cant, are in most cases not su cient to make the algorithms genuinely practical for large inputs. However, they do represent a tightening of the most intricate part of the analysis. There is undoubtedly room for re nement of other aspects of these algorithms, but such an investigation is beyond the scope of this paper. (i) The monomer-dimer or all-matchings chain Let H = (V;A) be a weighted graph with positive edge weights fc(a) : a 2 Ag, and consider the Markov chain whose state space X consists of all matchings in H , i.e., subsets M A such that no two edges in M share an endpoint. Transitions from M are made as follows: select an edge a = fu; vg of A uniformly at random, and then (i) if a 2 M , move to M a with probability 1=(1 + c(a)); (ii) if u and v are both unmatched in M , move to M + a with probability c(a)=(1 + c(a)); (iii) if a0 = fu;wg 2 M for some w , and v is unmatched in M , move to (M + a) a0 with probability c(a)=(c(a) + c(a0)); 9 (iv) in all other cases, do nothing. It is easy to check using (1) that this chain is reversible with stationary distribution (M) / w(M), where w(M) = Qa2M c(a) is the weight of matching M . Simulating the chain therefore enables one to sample matchings randomly with probabilities approximately proportional to their weights. This has several important applications to the design of polynomial-time approximation algorithms for hard combinatorial enumeration problems. In statistical physics, H describes a monomer-dimer system whose partition function is given by Z(H) = XM w(M): (8) Weighted sampling of matchings enables Z to be estimated accurately. The special case of (8) in which all edge weights are 1 corresponds to counting all matchings in H . Moreover, by varying the edge weights in H in a suitable fashion and sampling matchings as above, it is possible to estimate the number of matchings in H of a given size. In particular, for most graphs the number of perfect matchings can be estimated, a problem which corresponds to evaluating the permanent of a 0-1 matrix. Approximate counting of various other structures may be reduced to this problem [12, 20]. Finally, weighted sampling of matchings also enables a matching of nearly maximum cardinality to be found with high probability, an example of stochastic search by simulated annealing. Details of these applications may be found in [11, 23]. In order for the resulting algorithms to be e cient (in the sense of having polynomially bounded runtime), accurate sampling must be possible in time bounded by a polynomial in the size of H and cmax = maxf1;maxa2A c(a)g. By Proposition 1, this requires a bound on second eigenvalue of the form 1 1 1=poly(jHj; cmax). We now present a brief sketch of the canonical path argument used to obtain such a bound. In doing so, our aim will be to illustrate how the quantity arises naturally from a combinatorial encoding technique. For the details the reader is referred to [11, 23]. Let I and F be matchings in H , and consider the symmetric di erence S = I F . The connected components of S are paths and cycles in H whose edges belong alternately to I and F . The canonical path IF from I to F is determined as follows: order the components of S according to a xed underlying ordering on the paths and cycles of H ; \unwind" each component, removing edges of I and adding edges of F using transitions of the Markov chain, in an obvious way. Now let e = (M;M 0) be an arbitrary oriented edge (transition) in the graph describing the Markov chain, and denote by paths(e) the set of paths which pass 10 through e. The key idea is to enumerate paths(e) using the states of the chain themselves. Speci cally, we set up an injective mapping e : paths(e) ! X; so that each IF 2 paths(e) is encoded by a unique matching e(I; F ). Moreover, we do this in a way that preserves weights, i.e., w(M)w( e(I; F )) w(I)w(F ): (9) (Essentially, we just take e(I; F ) to be the complement of M in the multiset I [ F , though we have to take care to ensure that e(I; F ) is indeed a matching.) Now summing (9) over pairs I; F such that IF 2 paths(e), and recalling that ( ) / w( ), we get X IF3e (I) (F ) (M) X IF3e ( e(I; F )) (M); (10) since e is injective. But Q(e) = (M)P (M;M 0), so (10) gives us an upper bound on the crucial quantity Q(e) 1P IF3e (I) (F ), and hence on . Precisely, the bound derived in [11, 23] by this method is 4jAjc2max . Corollary 4 therefore yields 1 1 1=128jAj2c4max : On the other hand, the maximum length of any canonical path is easily seen to be at most jV j = n, so Corollary 6 gives the much sharper bound 1 1 1=4njAjc2max : The improvement in the spectral gap 1 1 , and hence in the mixing rate and the runtime of the algorithms mentioned above, is a factor of 32jAjc2maxn 1 . In the application to approximating the permanent, the largest value of cmax is the ratio of the number of \near-perfect" matchings to the number of perfect matchings in H . (A near-perfect matching is a matching in which precisely two vertices of H are unmatched.) This quantity is at least n=2, and can be quite large in interesting cases: for example, for dense graphs (with minimum vertex degree at least n=2), the ratio is about n2 and jAj n2=2, leading to an improvement of O(n5); and [11] gives a bound on the ratio of n10 for random graphs of low density. (The ratio can in fact be exponentially large, but then the chain no longer converges in polynomial time.) (ii) Broder's chain for the dense permanent This chain, which was proposed in [5] and analysed in [11, 23], is a restricted version of Example (i); it again allows the number of perfect matchings in a graph to be estimated in polynomial time provided the ratio of the number of near-perfect matchings to the number of perfect matchings is polynomially bounded. Let H 11 be an (unweighted) graph with n vertices; the states of the chain are all perfect matchings and all near-perfect matchings in H . Transitions are made in similar fashion to Example (i) but without weights; the stationary distribution is uniform. Using canonical paths similar to those in Example (i), and the same encoding technique, it can be shown that = O(n6), whence 1 1 O(n 12) by Corollary 4. However, since the maximum path length is at most 2n, Corollary 6 yields the sharper bound 1 1 O(n 7). The mixing rate is therefore reduced by a factor O(n5). This is exactly the same improvement as that discussed in Section 4 of [8]: in this case the Diaconis-Stroock bound (6) is equivalent to Theorem 5 because there are no weights, i.e., Q(e) is uniform. (iii) The Ising model In this example drawn from statistical physics, the states of the Markov chain are all subgraphs of the graph (V;A) of interactions of a ferromagnetic Ising system, i.e., all graphs (V;A0) where A0 A. (These graphs arise from the so-called high-temperature expansion of the partition function.) Transitions are made by random addition or subtraction of individual edges with appropriate probabilities. The stationary distribution assigns to each subgraph the weight j k , where ; 2 (0; 1) are parameters of the system, and j; k are respectively the number of edges and the number of odd-degree vertices in the subgraph. By sampling from this distribution, various important quantities, such as the partition function of the system, can be e ectively approximated; the details are in [13]. In [13] a choice of canonical paths is presented for which it can be shown, again using the encoding technique sketched in Example (i), that 2jAj 4 . This leads to the bound 1 1 8=32jAj2 , from Corollary 4. The length of paths here is at most jAj, so Corollary 6 yields the sharper bound 1 1 4=2jAj2 . The improvement in the spectral gap is therefore a factor 16 4 . In the applications discussed in [13], the parameter is taken down to n 1 , where n = jV j is the number of sites in the system. Hence the improvement in the runtime is a factor O(n4). (iv) Approximate counting and uniform generation The Markov chain considered here is of a di erent avour from those of Examples (i){(iii). It is based on a tree which re ects an inductive construction of a class of combinatorial structures; the structures themselves correspond to leaves of the tree. The transition probabilities are determined by weights attached to the edges of the tree, which in turn are derived from crude estimates of the number of structures in the subtree below the edge. Simulation of the Markov chain allows the structures to be sampled from an almost uniform distribution, and indirectly enables one to bootstrap the crude counting estimates to arbitrarily precise estimates of the number of structures. For the details and some applications, see [23, 24]. In [23, 24] a direct argument gives the bound (4r2d) 1 for the conductance, where d is the depth of the tree and r 1 is the error factor allowed in the crude counting estimates. This in turn yields, by Theorem 2, 1 1 (32r4d2) 1 . On the other hand, using (the only possible) canonical paths we get 8r2d and 12 ` 2d, which by Corollary 6 implies 1 1 (16r2d2) 1 . The improvement in the spectral gap is thus a factor 2r2 . 4 Multicommodity ow In this section we present a natural generalisation of the path-counting ideas of Section 2. We consider a multicommodity ow problem in the graph G describing a reversible Markov chain, and obtain upper bounds on 1 in terms of a measure on ows which is analogous to the measure of Section 2. Moreover, there is also a matching lower bound on 1 in terms of this measure, so that it, like the conductance , captures the mixing rate of a Markov chain rather closely. As in Section 2, let G be the weighted graph describing a reversible Markov chain with stationary distribution . Let us view G as a ow network by assigning to each oriented edge e of G the capacity Q(e). Now imagine that, for each ordered pair of distinct vertices x and y , a quantity (x) (y) of some commodity (x; y) is to be transported from x to y along the edges of the network. The object is to construct a ow which minimises the total throughput through any oriented edge e as a fraction of its capacity Q(e). This is entirely analogous to our previous measure , except that we are now allowing multiple paths between states rather than canonical paths. Thus it is natural to suppose that our new measure will yield similar bounds on the mixing rate. Formally, a ow in G is a function f : P ! R+ which satis es X p2Pxy f(p) = (x) (y) for all x; y 2 X; x 6= y ; where Pxy is the set of all simple directed paths from x to y in G and P = Sx6=y Pxy . Now extend f to a function on oriented edges by setting f(e) = Xp3e f(p); i.e., f(e) is just the total ow routed by f through e. By analogy with the de nition (3) of , the quality of a ow f is measured by the quantity (f), which is the maximum value over oriented edges e of the ratio f(e)=Q(e). Theorem 3 and Corollary 4 carry over immediately to this more general setting. Theorem 3 0 For any reversible Markov chain, and any ow f , (2 (f)) 1: Corollary 4 0 For any reversible Markov chain, and any ow f , the second eigenvalue 1 satis es 1 1 1 8 (f)2 : 13 In order to generalise the measure from Section 2 to a ow f , de ne a function f on oriented edges byf(e) = Xp3e f(p)jpj; where jpj is the number of edges in the path p. (We may think of f (e) as the elongated ow through e.) Now set (f) = maxe f(e)=Q(e). The proof of Theorem 5 carries over almost unchanged, and the analogue of Corollary 6 is then immediate. Theorem 5 0 For any reversible Markov chain, and any ow f , the second eigenvalue 1 satis es 1 1 1 (f) : Corollary 6 0 For any reversible Markov chain, and any ow f , the second eigenvalue 1 satis es 1 1 1 (f)`(f) ; where `(f) is the length of a longest path p with f(p) > 0. There are examples in which the extra exibility provided by ows (as opposed to canonical paths) is necessary in order to achieve good bounds on the mixing rate. Consider random walk on the complete bipartite graph K2;N 2 , with vertex set X = f0; 1; : : : ; N 1g and edges f0; ig, f1; ig for i = 2; 3; : : : ; N 1, in which transitions from each vertex are made by choosing a neighbour uniformly at random. (This process is periodic, but can be made ergodic by adding a holding probability of 1/2 to every vertex.) In the stationary distribution each vertex occurs with probability proportional to its degree, and Q(e) = 1=4(N 2) for all edges e. It is easy to construct a ow f with (f) = O(1), by distributing ow evenly over all shortest paths between each pair of vertices. By Corollary 4 0 this gives an estimate for the spectral gap which is correct to within a constant factor. However, since (0) (1) = 1=16, it is clear that the best value for (or ) obtainable using canonical paths is (N), leading to the weak bound 1 1 (N 2) (or 1 1 (N 1)). Here is a further example that illustrates the use of Theorem 5 0 . Consider the Bernoulli-Laplace di usion model, whose state space X is the set of all k -element subsets of [n] = f0; 1; : : : ; n 1g. Transitions are made from a given subset x 2 X by selecting uniformly at random an element i of x and an element j of [n] x and replacing i by j in x. The stationary distribution here is uniform, (x) = N 1 for all x 2 X , where N = (nk). Now let x; y be distinct elements of X , with jx yj = 2m. We de ne a ow f by routing (N2m! 2) 1 units of ow from x to y along each of the m! 2 shortest paths (of length m) from x to y . 14 (Each such path corresponds to an ordering of the m elements of x y and an ordering of the m elements of y x.) Now let e = (z; z0) be an arbitrary transition, with z0 = z [ fjg fig. To bound the ow through e, we again use the encoding technique sketched in Example (i) of Section 3. Let paths(e) denote the set of paths p 3 e with f(p) > 0. We de ne a many-to-one mapping e : paths(e) ! X as follows: if p is a path from x to y , set e(p) = x y z0 . Note that e(p) z0 = x y , so all paths p with a given image under e have the same length m and carry the same ow (N2m! 2) 1 , where 2m = j e(p) z0j. Moreover, the number of such paths is m 1 X r=0 m 1 r !2r! 2(m r 1)! 2 = m(m 1)! 2 : (Here r corresponds to the distance along the path from x to z .) Thus the total elongated ow through e contributed by these paths is m2(m 1)! 2(N2m! 2) 1 = N 2 . Finally, summing over images e(p), and noting that the range of e consists of all subsets that contain i and do not contain j , we see that f(e) = 1 N2 n 2 k 1! = k(n k) Nn(n 1) : But since Q(e) = (Nk(n k)) 1 for all edges e, we deduce that k2(n k)2 n(n 1) . Theorem 5 0 therefore yields the bound 1 1 n(n 1) k2(n k)2 . In the case n = 2k , the exact value is 1 = 1 2=k , so the estimate is correct within a factor of about k=2. It seems di cult to get this close using canonical paths. A similar bound was obtained by a slightly di erent method in [8]. The above idea of using all geodesic paths rst appeared in the analysis of a Markov chain on matchings by Dagum et al [6]. By analogy with the conductance , we may de ne the resistancey of a reversible Markov chain described by a graph G by (G) = inf f (f); where the in mum is taken over all valid ows in G. Corollary 4 0 indicates that provides a lower bound on the spectral gap of the form ( 2). Thus in particular a family of Markov chains will be rapidly mixing if is bounded above by a polynomial in the problem size. From Theorem 2 we already know that characterises the rapid mixing property, since it measures the spectral gap up to a square. It is natural to ask whether provides a similar characterisation. yThis terminology is chosen because, as we shall see shortly, is almost the inverse of the conductance . It should not be confused with the resistance of a graph familiar from electrical network theory. 15 This is indeed the case. We will show that, if a reversible Markov chain is close to equilibrium after steps, then it supports a ow f for which (f) = O( ), where N = jXj is the number of states. Therefore, the chain is rapidly mixing if and only if is bounded above by a polynomial in the problem size. In order to state this result (Theorem 8), we need to formalise the notion of the time for a chain to become close to equilibrium. With x de ned as in Section 2, set = maxx2X x(1=4), i.e., is the time taken for the variation distance from an arbitrary initial state to fall to 1/4. As will become clear, the value 1/4 is not signi cant and could be replaced by any su ciently small constant. First we need a simple technical lemma. Lemma 7 For any t 2 and all x; y 2 X , P t(x; y) (y) 1 8 : Proof: Fix arbitrary states x; y 2 X , and consider the set Z = fz 2 X : P (y;z) (z) 12g. It is not hard to see that (Z) 12 , which by de nition of ensures that P t(x;Z) 1 4 for all t . Now we have, for any t 2 , P t(x; y) X z2Z P t (x; z)P (z; y) = (y) X z2Z P t (x; z) P (y; z) (z) (y) 2 X z2Z P t (x; z) (y) 8 ; where in the second line we have used the fact that (y)P (y; z) = (z)P (z; y), which follows from reversibility. Theorem 8 The resistance of an ergodic reversible Markov chain satis es 16 ; where is de ned as above. Proof: We show how to construct a ow f with the stated bound on (f). Let t = 2 . For a given state x, the choice of paths used to carry ow from x to other states is determined by the t-step evolution of the Markov chain itself, starting at x. More precisely, let P(t) xy denote the set of all (not necessarily simple) paths of length exactly t from x to y in the underlying graph G, and for p 2 P(t) xy let 16 prob(p) denote the probability that the Markov chain, starting in state x, makes the sequence of transitions de ned by p. Note that Lemma 7 guarantees that P(t) xy is non-empty for all x; y . Now for each x; y and p 2 P(t) xy , set f(p) = (x) (y)prob(p) Pp2P(t) xy prob(p) = (x) (y)prob(p) P t(x; y) ; (11) and set f(p) = 0 for all other paths p 2 P . Note that Pp2P(t) xy f(p) = (x) (y) for all x; y . Strictly speaking, f is not a ow according to our de nition since the paths in P(t) xy are not necessarily simple; however, we can always obtain a ow f 0 from f , without increasing the throughput through any edge, by simply bypassing the cycles on all paths. We now proceed to estimate (f 0). From (11), the ow routed by f 0 through e is f 0(e) Xx;y X p2P(t) xy p3e (x) (y)prob(p) P t(x; y) 8Xx;y X p2P(t) xy p3e (x)prob(p); (12) where the second inequality follows from Lemma 7. Now the nal double sum in (12) is precisely the probability that the Markov chain, when started in the stationary distribution over X , traverses the oriented edge e within t steps. But this probability is at most tQ(e), since the probability that the stationary process traverses e in any one step is precisely Q(e). Combining this observation with (12) yields (f 0) = max e f 0(e) Q(e) 8t = 16 ; as required. Remarks: (a) An analogous result in terms of the elongated ow measure also holds: since the proof of Theorem 8 uses only paths of length t = 2 , we have constructed a ow f for which (f) = O( 2). (b) We have stated Theorem 8 in terms of the variation distance, for consistency with our earlier approach. It should be clear that similar formulations in terms of other measures are possible. For example, de ne rpd(t) = maxx;y jP t(x; y) (y)j= (y), the relative pointwise distance at time t maximised over initial states x. Then the proof of Theorem 8 shows that 1 rpd(t) 1t ; provided t is large enough that P t(x; y) > 0 for all x; y 2 X . A similar result has been observed independently by Jim Fill [10]. 17 In fact, a direct comparison of and sheds some interesting light on thesetwo quantities. It is convenient at this point to introduce a symmetrised versionof , namely0 = minS X0< (S)<1 Q(S; S)(S) (S) :Clearly0 2 , so and 0 di er by at most a constant factor. Inspectionof the proof of Theorem 3 reveals that the marginally stronger bound01(13)also holds. The reader will recognise this bound as nothing other than the trivialdirection of the maxow min-cut theorem for our multicommodity ow problem:the net ow across any cut cannot exceed the capacity of the cut.y In view ofthe well-known result for single commodity ows, one might ask whether equalityholds in (13). We have already seen an example where 0 = 1 , namely theasymmetric random walk of Section 2. The reason for this is that the underlyinggraph G is a tree, so there is a unique valid ow f , and it is easy to see that theonly cuts we need consider in the de nition of 0 are single edges. Hence we have0 = mineQ(e)Px2S (x) Py2S (y) = mine Q(e)f(e) = 1;where (S; S) is the partition of X induced by the cut edge e.The above question was extensively studied in more generality, and in a totallyunrelated context, by Matula and Shahrokhi [18, 22]. The determination of isa case of what Matula and Shahrokhi call the Maximum Concurrent Flow Prob-lem, while computing 0 is a case of the Sparsest Cut Problem. Matula andShahrokhi show that these two problems are \near-duals" of one other, and makethis statement precise. They call graphs for which equality holds in (13) bot-tleneck graphs, and identify some examples. In our language, these include treeprocesses (i.e., Markov chains whose underlying graph G is a tree), and randomwalks on complete graphs, cycles and cubes. More signi cantly, they also exhibitexamples for which equality de nitely does not hold. To see that this can happen,consider the random walk on K2;N 2 discussed earlier. It is not hard to verifythat 0 = 1 + O(N 2) (and 0 = 1 when N is even), but that the resistance= 54 +O(N 1). Asymptotically, therefore, exceeds 0 1 by a factor54 . Thuswe are led to ask by how much can exceed 0 1 in general.This question was addressed, again in a di erent context, by Leighton andRao [17]. They show that, in the case of uniform ow between all pairs of verticesyNote that our problem can be recast in more conventional ow maximisation terms asfollows: determine the maximum value of F such that F (x) (y) units of commodity (x; y)can be transported from x to y , for every pair x; y , and such that the ow through any edge edoes not exceed its capacity Q(e). The maximum such F is precisely 1 .18 (i.e., in our language, the stationary distribution is uniform: (x) = N 1 forall x 2 X ), cannot exceed1 by more than a factor O(logN). This yields anapproximate maxow min-cut theorem for the (uniform) multicommodity owproblem. Moreover, this bound is tight since = ( logN) for random walk on anexpander graph of constant degree: here is constant, but = (logN) sincethe distance between most pairs of vertices is (logN) and the graph has only(N) edges. As has been observed by Eva Tardos, Leighton and Rao's resultactually holds in the more general situation where the ow between each pair x; yis of the form v(x)v(y) for any xed function v : X ! R+ . In our setting, thisleads to the following result for arbitrary stationary distributions; since the proofis essentially the same as that of [17, Theorem 1] we omit it here.Theorem 9 For any reversible Markov chain with N states,= O logN! :Putting together Theorem 9 and Theorem 2, we obtain a lower bound on thesecond eigenvalue in terms of .Corollary 10 For any reversible Markov chain with N states, the second eigen-value 1 satis es1 1 O logN! :As far as bounds on the mixing rate are concerned, Theorem 8 is ratherstronger than Corollary 10: Theorem 8 says that = ( ), whereas Corollary 10,in conjunction with Proposition 1(ii), gives the weaker bound = ( = logN).However, Theorem 9 and Corollary 10 seem to be of interest in their own right.AcknowledgementsIt is a pleasure to thank David Aldous, Persi Diaconis, Satish Rao, Alan Sokal,Eva Tardos and Umesh Vazirani for helpful discussions on the ideas presentedhere, and Mark Jerrum both for helpful discussions and for his comments on anearlier version of the paper.References[1] Aldous, D. Some inequalities for reversible Markov chains. Journal of theLondon Mathematical Society (2) 25 (1982), pp. 564{576.[2] Aldous, D. On the Markov chain simulation method for uniform combinat-orial distributions and simulated annealing. Probability in the Engineeringand Informational Sciences 1 (1987), pp. 33{46.19 [3] Alon, N. Eigenvalues and expanders. Combinatorica 6 (1986), pp. 83{96.[4] Alon, N. and Milman, V.D.1 , isoperimetric inequalities for graphs andsuperconcentrators. Journal of Combinatorial Theory Series B 38 (1985),pp. 73{88.[5] Broder, A.Z. How hard is it to marry at random? (On the approximationof the permanent). Proceedings of the 18th ACM Symposium on Theory ofComputing, 1986, pp. 50{58. Erratum in Proceedings of the 20th ACM Sym-posium on Theory of Computing, 1988, p. 551.[6] Dagum, P., Luby, M., Mihail, M. and Vazirani, U.V. Polytopes, permanentsand graphs with large factors. Proceedings of the 29th IEEE Symposium onFoundations of Computer Science (1988), pp. 412{421.[7] Diaconis, P. Group representations in probability and statistics. LectureNotes Monograph Series Vol. 11, Institute of Mathematical Statistics, Hay-ward, California, 1988.[8] Diaconis, P., and Stroock, D. Geometric bounds for eigenvalues of Markovchains. Annals of Applied Probability 1 (1991), pp. 36{61.[9] Dyer, M., Frieze, A. and Kannan, R. A random polynomial time algorithmfor approximating the volume of convex bodies. Proceedings of the 21st ACMSymposium on Theory of Computing (1989), pp. 375{381.[10] Fill, J. Unpublished manuscript.[11] Jerrum, M. R. and Sinclair, A. J. Approximating the permanent. SIAMJournal on Computing 18 (1989), pp. 1149{1178.[12] Jerrum, M. R. and Sinclair, A. J. Fast Uniform Generation of RegularGraphs. Theoretical Computer Science 73 (1990), pp. 91{100.[13] Jerrum, M. R. and Sinclair, A. J. Polynomial-time approximation al-gorithms for the Ising model. Technical Report CSR-1-90, Dept. of Com-puter Science, University of Edinburgh. To appear in SIAM Journal onComputing; Extended Abstract in Proceedings of the 17th International Col-loquium on Automata, Languages and Programming (1990), Springer LNCSVol. 443, pp. 462{475.[14] Karzanov, A. and Khachiyan, L. On the conductance of order Markovchains. Technical Report DCS 268, Rutgers University, June 1990.[15] Keilson, J. Markov chain models | rarity and exponentiality. Springer-Verlag, New York, 1979.[16] Lawler, G.F. and Sokal, A.D. Bounds on the L2 spectrum for Markov chainsand Markov processes: a generalization of Cheeger's inequality. Transactionsof the American Mathematical Society 309 (1988), pp. 557{580.[17] Leighton, T. and Rao, S. An approximate maxow min-cut theorem foruniform multicommodity ow problems with applications to approximation20 algorithms. Proceedings of the 29th IEEE Symposium on Foundations ofComputer Science (1988), pp. 422{431.[18] Matula, D. W. and Shahrokhi, F. Sparsest cuts and bottlenecks in graphs.Discrete Applied Mathematics 27 (1990), pp. 113{123.[19] Mihail, M. Conductance and convergence of Markov chains: a combinat-orial treatment of expanders. Proceedings of the 30th IEEE Symposium onFoundations of Computer Science (1989), pp. 526{531.[20] Mihail, M. and Winkler, P. On the number of Eulerian orientations ofa graph. Proceedings of the 3rd ACM-SIAM Symposium on Discrete Al-gorithms (1992), pp. 138{145.[21] Mohar, B. Isoperimetric numbers of graphs. Journal of CombinatorialTheory, Series B 47 (1989), pp. 274{291.[22] Shahrokhi, F. and Matula, D. W. The maximum concurrent ow problem.Journal of the ACM 37 (1990), pp. 318{334.[23] Sinclair, A.J. Algorithms for random generation and counting: a Markovchain approach. PhD Thesis, University of Edinburgh, June 1988. To ap-pear as a monograph in the series Progress in Theoretical Computer Science,Birkhauser, Boston, 1992.[24] Sinclair, A.J. and Jerrum, M.R. Approximate counting, uniform generationand rapidly mixingMarkov chains. Information and Computation 82 (1989),pp. 93{133.21
منابع مشابه
A Semidefinite Bound for Mixing Rates of Markov Chains
We study the method of bounding the spectral gap of a reversible Markov chain by establishing canonical paths between the states. We provide examples where improved bounds can be obtained by allowing variable length functions on the edges. We give a simple heuristic for computing good length functions. Further generalization using multicommodity ow yields a bound which is an invariant of the Ma...
متن کاملBroder's Chain Is Not Rapidly Mixing
We prove that Broder’s Markov chain for approximate sampling near-perfect and perfect matchings is not rapidly mixing for Hamiltonian, regular, threshold and planar bipartite graphs, filling a gap in the literature. In the second part we experimentally compare Broder’s chain with the Markov chain by Jerrum, Sinclair and Vigoda from 2004. For the first time, we provide a systematic experimental ...
متن کاملStochastic bounds for a single server queue with general retrial times
We propose to use a mathematical method based on stochastic comparisons of Markov chains in order to derive performance indice bounds. The main goal of this paper is to investigate various monotonicity properties of a single server retrial queue with first-come-first-served (FCFS) orbit and general retrial times using the stochastic ordering techniques.
متن کاملMixing Time Bounds via the Spectral Profile Sharad Goel, Ravi Montenegro and Prasad Tetali
On complete, non-compact manifolds and infinite graphs, FaberKrahn inequalities have been used to estimate the rate of decay of the heat kernel. We develop this technique in the setting of finite Markov chains, proving upper and lower L mixing time bounds via the spectral profile. This approach lets us recover and refine previous conductance-based bounds of mixing time (including the Morris-Per...
متن کاملConductance and Rapidly Mixing Markov Chains
Conductance is a measure of a Markov chain that quantifies its tendency to circulate around its states. A Markov chain with low conductance will tend to get ‘stuck’ in a subset of its states whereas one with high conductance will jump around its state space more freely. The mixing time of a Markov chain is the number of steps required for the chain to approach its stationary distribution. There...
متن کاملMetrics for Characterizing the Quality of Service of Multicommodity Flows in Multi-Channel Multi-Radio Multi-Rate Wireless Mesh Networks
This paper proposes important performance metrics for characterizing the quality of service of multicommodity ows in multi-channel, multi-radio, multi-rate (MC-MR) wireless mesh networks. The term multicommodity implies different types of applications like audio, video, text etc. being serviced simultaneously by the network. Traditionally, the capacity of the links in a mesh network is assu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1992